[FLINK-39149][mysql-cdc] Implement LATEST mode GTID merging to prevent replaying pre-checkpoint transactions with non-contiguous ranges#4286
Conversation
…t replaying pre-checkpoint transactions with non-contiguous ranges
…t replaying pre-checkpoint transactions with non-contiguous ranges
There was a problem hiding this comment.
Pull request overview
This pull request fixes a critical bug (FLINK-39149) in MySQL CDC's GTID merging logic for "LATEST" mode. When recovering from a checkpoint with non-contiguous GTID ranges (e.g., aaa-111:5000-8000 instead of aaa-111:1-8000), the previous implementation would cause MySQL to replay transactions that occurred before the checkpoint, leading to duplicate data processing.
Changes:
- Introduced
GtidUtils.computeLatestModeGtidSet()method to properly handle GTID merging for LATEST mode with non-contiguous ranges - Updated
MySqlStreamingChangeEventSource.filterGtidSet()in both mysql-cdc and oceanbase-cdc connectors to use the new method - Added comprehensive unit and integration tests to prevent regression
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/GtidUtils.java |
Adds new computeLatestModeGtidSet method that fixes old channels' GTID ranges and uses full server GTID for new channels |
flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/MySqlStreamingChangeEventSource.java |
Updates LATEST mode branch in filterGtidSet to use new computeLatestModeGtidSet method |
flink-connector-oceanbase-cdc/src/main/java/io/debezium/connector/mysql/MySqlStreamingChangeEventSource.java |
Mirrors the mysql-cdc changes for consistency across connectors |
flink-connector-mysql-cdc/src/test/java/io/debezium/connector/mysql/GtidUtilsTest.java |
Adds parameterized tests for various GTID scenarios including purged GTIDs and source filters |
flink-connector-mysql-cdc/src/test/java/io/debezium/connector/mysql/FilterGtidSetTest.java |
New integration test file ensuring the fix works end-to-end through filterGtidSet method |
flink-connector-mysql-cdc/pom.xml |
Adds Mockito 3.12.4 dependency for testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
What happens if in checkpoint gtid is: |
@mielientiev Restart from |
ruanhang1993
left a comment
There was a problem hiding this comment.
LGTM. Could this issue be covered by a test to verify the reliability of the code fix?
|
…t replaying pre-checkpoint transactions with non-contiguous ranges (apache#4286)
…t replaying pre-checkpoint transactions with non-contiguous ranges (apache#4286)
This close https://issues.apache.org/jira/browse/FLINK-39149
This pull request addresses a critical bug in GTID merging logic for MySQL CDC connectors, specifically in "LATEST" new-channel-position mode. The changes ensure that, when recovering from a checkpoint with non-contiguous GTID ranges, MySQL does not replay pre-checkpoint transactions. The fix is implemented by introducing a new method,
computeLatestModeGtidSet, and updating the merging logic to use it.